This supplemental document provides a transparent and reproducible framework for the data generation and methods used in our matching-adjusted indirect comparison (MAIC) analysis.
Specifically, this document:
Provides summary-level baseline and safety data from published studies (IMspire150, COLUMBUS, COMBI-D/COMBI-v, and coBRIM).
Demonstrates the process of extracting survival data from published Kaplan-Meier curves using digitization and reconstruction methods.
Presents a worked example of the MAIC process, including:
Simulation of individual patient-level data (IPD) for the NIVO+RELA cohort
Matching methodology
Example analysis of binary (overall response rate) and time-to-event (overall survival) outcomes
To maintain confidentiality, IPD from Relativity 047 are simulated for illustrative purposes only. These data do not represent real patients or actual study results.
For simplicity, the worked example focuses on overall response rate (ORR) and overall survival (OS) but the same methodology is applicable to other outcomes. All analyses are conducted in R.
1.2 MAIC Process Schematic
Show R Code
library(ggplot2)library(gridExtra)library(grid)library(gt)library(dplyr)library(survival)library(survminer)######################################################################################################## Create Tables and Plots####################################################################################################### === Dummy KM Plot ===combi_km_data <-tibble(time =c(0, 5, 10, 15, 20, 25, 30),surv =c(1, 0.9, 0.75, 0.6, 0.4, 0.2, 0.1))combi_km_plot <-ggplot(combi_km_data, aes(x = time, y = surv)) +geom_step(size =1.2, color ="steelblue") +theme_minimal() +labs(title ="Kaplan-Meier Curve") +theme(axis.title =element_blank(), axis.text =element_blank(), axis.ticks =element_blank(), plot.title =element_text(hjust =0.5, face ="bold", size =12))combi_km_grob <-ggplotGrob(combi_km_plot)# Dummy RELA KM Curverela_km_data <-tibble(time =c(0, 5, 10, 15, 20), surv =c(1, 0.85, 0.7, 0.5, 0.3))rela_km_plot <-ggplot(rela_km_data, aes(x = time, y = surv)) +geom_step(size =1.2, color ="firebrick") +theme_minimal() +labs(title ="Kaplan-Meier Curve") +theme(axis.title =element_blank(), axis.text =element_blank(), axis.ticks =element_blank(), plot.title =element_text(hjust =0.5, face ="bold", size =12))rela_km_grob <-ggplotGrob(rela_km_plot)# Overlayed DMoverlay_km_plot <-ggplot() +geom_step(data = combi_km_data, aes(x = time, y = surv), color ="steelblue", size =1.2) +geom_step(data = rela_km_data, aes(x = time, y = surv), color ="firebrick", size =1.2) +theme_minimal() +labs(title ="MAIC Applied KM Curves") +theme(axis.title =element_blank(), axis.text =element_blank(), axis.ticks =element_blank(), plot.title =element_text(hjust =0.5, face ="bold", size =12))overlay_km_grob <-ggplotGrob(overlay_km_plot)# === Create Fake Survival Tables ===pseudo_ipd_surv_df <-tibble(time =c(0.386, 0.802, 1.03, 1.302, 1.522),event =c(0, 1, 0, 1, 0))rela_ipd_surv_df <-tibble(time =c(0.42, 0.88, 1.12, 1.45, 1.9),event =c(0, 1, 1, 0, 1))pseudo_ipd_surv_grob <-tableGrob( pseudo_ipd_surv_df, rows =NULL, theme =ttheme_default(base_size =6.8,padding =unit(c(0.5, 0.5), "mm") ))rela_ipd_surv_grob <-tableGrob( rela_ipd_surv_df, rows =NULL, theme =ttheme_default(base_size =6.8,padding =unit(c(0.5, 0.5), "mm") ))# === Dataframes ===rela_df <-tibble(ID =1:4,Female =c(1, 0, 1, 0),Age =c(47, 52, 77, 61),ECOG =c(0, 0, 1, 0),Time =c(5, 10, 15, 20),Event =c(1, 0, 1, 1),trt =rep("NIVO+RELA", 4))rela_df_weighted <- rela_df |>mutate(Weight =c(1.2, 0.8, 1.0, 1.1))# Summary Level Data from Combi Trialscombi_df<-tibble(Female =0.433,`Age <55`=0.500,`ECOG 0`=0.715,trt =rep("DAB+TRAM"))# === Convert tables and shrink them using theme ==## Relativity 047rela_grob <-tableGrob( rela_df, rows =NULL, theme =ttheme_default(base_size =6.8, # Keeps text readablepadding =unit(c(1, 1), "mm") # Reduce vertical and horizontal padding ))rela_weighted_grob <-tableGrob( rela_df_weighted, rows =NULL, theme =ttheme_default(base_size =6.8,padding =unit(c(1, 1), "mm") ) )## Combicombi_grob <-tableGrob( combi_df, rows =NULL, theme =ttheme_default(base_size =6.8,padding =unit(c(1, 1), "mm") ) )######################################################################################################## Create Visualization Layers####################################################################################################### === Base plot ===schema_plot <-ggplot() +xlim(0, 20) +ylim(0, 10) +theme_void()# === Data Acquisition & Processing background (left + middle columns) ===schema_plot <- schema_plot +geom_rect(aes(xmin =0.0, xmax =10.0, ymin =0.8, ymax =9.2), fill ="#B3CDE3", alpha =0.2, color ="black") +# Matched Analysis & Outcomes background (right column)geom_rect(aes(xmin =10.0, xmax =20, ymin =0.8, ymax =9.2), fill ="#F4A582", alpha =0.2, color ="black") # === Boxes === (Set width uniformly: xmin = x, xmax = x + 4.75)schema_plot <- schema_plot +# Published Trialsgeom_rect(aes(xmin =0.125, xmax =4.875, ymin =5.1, ymax =9), fill ="#BBDEFB", color ="black") +# RELATIVITY-047geom_rect(aes(xmin =0.125, xmax =4.875, ymin =1, ymax=4.9), fill ="#FFCDD2", color ="black") # Pseudo and Observed IPD Column (second column)schema_plot <- schema_plot +geom_rect(aes(xmin =5.0, xmax =9.875, ymin =5.1, ymax =9), fill ="#D7F7E4", color ="black") +geom_rect(aes(xmin =5.0, xmax =9.875, ymin =1, ymax=4.9), fill ="#D7F7E4", color ="black") # Weighted Column (third column)schema_plot <- schema_plot +geom_rect(aes(xmin =10.125, xmax =14.875, ymin =5.1, ymax =9), fill ="#FFF0B3", color ="black") +geom_rect(aes(xmin =10.125, xmax =14.875, ymin =1, ymax=4.9), fill ="#FFF0B3", color ="black") # Comparison Column (fourth column)schema_plot <- schema_plot +geom_rect(aes(xmin =15, xmax =19.875, ymin =1, ymax =9), fill ="#FFB74D", color ="black") # Box behind WebPlotDigitzerschema_plot <- schema_plot +geom_rect(aes(xmin =4.0, xmax =6.0, ymin =5.9, ymax =6.5), fill ="#FFD580", color ="black") # Box behind Use Observed IPDschema_plot <- schema_plot +geom_rect(aes(xmin =4.0, xmax =6.0, ymin =2.18, ymax =2.5), fill ="#FFD580", color ="black") # Box behind Matching Processschema_plot <- schema_plot +geom_rect(aes(xmin =8.9, xmax =10, ymin =4.38, ymax =5.5), fill ="#FFD580", color ="black") # === Kaplan Meier Curves ===schema_plot <- schema_plot +# Combi KM Curveannotation_custom( combi_km_grob, xmin =0.75, xmax =4, ymin =5.1, ymax =6.6) +# Embed RELA KM curve under the IPD tableannotation_custom( rela_km_grob, xmin =0.75, xmax =4, ymin =1.1, ymax =2.6) +# Embed RELA KM curve under the IPD tableannotation_custom( overlay_km_grob, xmin =16, xmax =19.25, ymin =4.5, ymax =6.2) # === Embed shrunk GROBs (Summary Level and IPD Data Tables) ===schema_plot <- schema_plot +# combi tableannotation_custom(combi_grob, xmin =1.5, xmax =3.5, ymin =6.8, ymax =7.8) +annotation_custom(combi_grob, xmin =7.0, xmax =8.0, ymin =6.8, ymax =7.8) +annotation_custom(combi_grob, xmin =11.5, xmax =13.5, ymin =6.8, ymax =7.8) +# relativity 047 tablesannotation_custom(rela_grob, xmin =1.5, xmax =3.5, ymin =2.7, ymax =4.0) +annotation_custom(rela_grob, xmin =6.5, xmax =8.5, ymin =2.7, ymax =4.0) +annotation_custom(rela_weighted_grob, xmin =12.0, xmax =13, ymin =2.7, ymax =4.0) # === Embed Survival Tables ===# Add survival tables in the second column, matching the height of KM curvesschema_plot <- schema_plot +# Pseudo-IPD survival table under the Published Data green boxannotation_custom(grob =grobTree( pseudo_ipd_surv_grob, vp =viewport(width =0.7, height =0.6)),xmin =7, xmax =8, ymin =5.25, ymax =6.4 ) +# Pseudo-IPD survival table under the Published Data Yellow boxannotation_custom(grob =grobTree( pseudo_ipd_surv_grob, vp =viewport(width =0.7, height =0.6)),xmin =12, xmax =13, ymin =5.25, ymax =6.4 ) +# RELATIVITY survival table under the RELATIVITY-047 green boxannotation_custom(grob =grobTree( rela_ipd_surv_grob, vp =viewport(width =0.7, height =0.6)),xmin =7, xmax =8, ymin =1.3, ymax =2.45 ) +# RELATIVITY survival table under the RELATIVITY-047 Yellow boxannotation_custom(grob =grobTree( rela_ipd_surv_grob, vp =viewport(width =0.7, height =0.6)),xmin =12, xmax =13, ymin =1.3, ymax =2.45 ) # === Arrows ===schema_plot <- schema_plot +# Arrow under WebPlotDigitizer from first to second columngeom_curve(aes(x =4.5, xend =5.5, y =5.8, yend =5.8), arrow =arrow(length =unit(0.3, "cm")), curvature =0) +# Arrow under use observed IPD first to second columngeom_curve(aes(x =4.5, xend =5.5, y =2, yend =2), arrow =arrow(length =unit(0.3, "cm")), curvature =0) +# First row arrow between second and third columngeom_curve(aes(x =9.5, xend =10.5, y =5.8, yend =5.8), arrow =arrow(length =unit(0.3, "cm")), curvature =0) +# Second row arrow between second and third columngeom_curve(aes(x =9.5, xend =10.5, y =2, yend =2), arrow =arrow(length =unit(0.3, "cm")), curvature =0) +# Curve from Combi Summarized Table Weighted IPDgeom_curve(aes(x =9.4, xend =10.1, y =7.1, yend =4.0), arrow =arrow(length =unit(0.3, "cm")), curvature =-0.3, color ="darkblue") +# Curve from Relativity IPD to Weighted IPDgeom_curve(aes(x =9.4, xend =10.1, y =4.0, yend =4.0), arrow =arrow(length =unit(0.3, "cm")), curvature =-0.3, color ="darkblue") +# Curve from combi summarized table to MAIC KM geom_curve(aes(x =14.99, xend =16.0, y =7.1, yend =5.25), arrow =arrow(length =unit(0.3, "cm")), curvature =0, color ="darkblue") +# Curve from combi survival table to MAIC KM geom_curve(aes(x =14.99, xend =16.0, y =6, yend =5.25), arrow =arrow(length =unit(0.3, "cm")), curvature =0, color ="darkblue") +# Curve from RELATIVITY-047 summarized table to MAIC KM geom_curve(aes(x =14.99, xend =16.0, y =3.5, yend =5.25), arrow =arrow(length =unit(0.3, "cm")), curvature =0, color ="darkblue") +# Curve from RELATIVITY-047 survival table to MAIC KM geom_curve(aes(x =14.99, xend =16.0, y =2.5, yend =5.25), arrow =arrow(length =unit(0.3, "cm")), curvature =0, color ="darkblue") # === Add Labels ===schema_plot <- schema_plot +# Background Box - Label Leftannotate("text", x =5.75, y =9.4, label ="Data Acquisition & Processing", size =4, fontface ="bold") +# Background Box - Label Rightannotate("text", x =15.5, y =9.4, label ="Matched Analysis & Outcomes",size =4, fontface ="bold") +# Published Trial Boxes Labels (Top row)## First Columnannotate("text",x =2.5, y =8.9,label ="Published Trials\nSummary-Level Data &\nKaplan-Meier Curves", size =4, fontface ="bold",vjust =1, hjust =0.5) +## Second Columnannotate("text",x =7.5, y =8.9, label ="Published Trials\nSummary-Level Data &\n Pseudo-IPD Survival Table", size =4, fontface ="bold",vjust =1, hjust =0.5) +## Third Columnannotate("text",x =12.5, y =8.9, label ="Published Trials\nSummary Level Data &\n Pseudo-IPD Survival Table", size =4, fontface ="bold",vjust =1, hjust =0.5) +# Fourth Column annotate("text",x =17.5, y =8.9, label ="MAIC\n Published Trials &\nRELATIVITY-047", size =4, fontface ="bold",vjust =1, hjust =0.5) +# RELATIVITY-047 Labels## First Columnannotate("text",x =2.5,y =4.75, label ="RELATIVITY-047\nIndividual Patient Data", size =4, fontface ="bold",vjust =1, hjust =0.5) +# Second Column annotate("text",x =7.5,y =4.75, label ="RELATIVITY-047\nIndividual Patient Data", size =4, fontface ="bold",vjust =1, hjust =0.5) +# Third Column annotate("text",x =12.5,y =4.75, label ="RELATIVITY-047\n Weighted IPD", size =4, fontface ="bold",vjust =1, hjust =0.5) +# Published Trials - Summary Patient Characteristicsannotate("text", x =2.5, y =7.7, label ="Patient Characteristics - Summarized", size =3.5, fontface ="bold", vjust =1, hjust =0.5) +annotate("text", x =7.5, y =7.7, label ="Patient Characteristics - Summarized", size =3.5, fontface ="bold", vjust =1, hjust =0.5) +annotate("text",x =12.5, y =7.7, label ="Patient Characteristics - Summarized", size =3.5, fontface ="bold", vjust =1, hjust =0.5) +# Observed IPD - Individual Patient Dataannotate("text", x =2.5, y =4.0, label ="Patient Characteristics - IPD", size =3.5, fontface ="bold", vjust =1, hjust =0.5) +# Observed IPD - Individual Patient Dataannotate("text", x =7.5, y =4.0, label ="Patient Characteristics - IPD", size =3.5, fontface ="bold", vjust =1, hjust =0.5) +# Observed IPD - Individual Patient Dataannotate("text", x =12.5, y =4.0, label ="Patient Characteristics - Weighted IPD", size =3.5, fontface ="bold", vjust =1, hjust =0.5) +# Survival Table Headers (Top survival table)annotate("text", x =7.5, y =6.5, label ="Survival Table", size =3.5, fontface ="bold", vjust =1, hjust =0.5) +annotate("text", x =12.5, y =6.5, label ="Survival Table", size =3.5, fontface ="bold", vjust =1, hjust =0.5) +# Survival Table Headers (Bottom survival table)annotate("text", x =7.5, y =2.55, label ="Survival Table", size =3.5, fontface ="bold", vjust =1, hjust =0.5) +# Survival Table Headers (Bottom survival table)annotate("text", x =12.5, y =2.55, label ="Survival Table", size =3.5, fontface ="bold", vjust =1, hjust =0.5) +# Label for KM to Survival table arrow for combi annotate("text", x =5.0, y =6.4, label ="Extract Pseudo-IPD\n(WebPlotDigitizer)", size =3, vjust =1, hjust =0.5) +# Label for KM to Survival table arrow for relativity 047annotate("text", x =5.0, y =2.4, label ="Use Observed IPD", size =3, vjust =1, hjust =0.5) +# Annotate label for matching stepannotate("text", x =9.45, y =5.4, label ="Matching\nProcess\n(Weights\nEstimated)", size =3, fontface ="italic", color ="darkblue", vjust =1, hjust =0.5) # === Title ===schema_plot <- schema_plot +annotate("text", x =10, y =9.99, label ="Overall Schema of MAIC Process", size =5.5, fontface ="bold",hjust =0.5,vjust =1)# === Render ===print(schema_plot)
1.2.1 Figure Legend: MAIC Process Schematic
This schematic provides a high-level overview of the MAIC framework, using the comparison of Nivolumab + Relatlimab (NIVO+RELA) and Dabrafenib + Trametinib (DAB+TRAM) as an illustrative example. The figure is designed to demonstrate key concepts in the process and does not capture every step involved in the full analysis. All data shown are simulated and intended solely for illustration. They do not represent actual patient-level data or trial results used in the analysis or manuscript.
1.2.1.1 Data Acquisition & Processing (Blue / Green Panels)
Published Trials: Summary-level patient characteristics and Kaplan-Meier (KM) curves are abstracted from published sources.
Pseudo-IPD Reconstruction: The KM curves are digitized using WebPlotDigitizer to generate pseudo-individual patient data (pseudo-IPD) with survival times and event indicators.
RELATIVITY-047 IPD: Individual patient-level data (IPD) from the RELATIVITY-047 trial is used as the observed dataset, including baseline characteristics and survival outcomes.
Survival Tables: Both datasets are converted into survival tables to prepare for matching.
Matching Process: The RELATIVITY-047 IPD is reweighted to align baseline characteristics with the published comparator data. Estimated weights ensure comparability across covariates such as age, sex, and ECOG performance status.
Weighted IPD and Pseudo-IPD: Both the pseudo-IPD and the weighted RELATIVITY-047 data are prepared for outcome comparison.
MAIC Comparison: The adjusted overall survival (OS) and objective response rate (ORR) are compared between NIVO+RELA and DAB+TRAM using the reweighted datasets.
1.2.1.3 Visualization Elements
Blue KM curves represent the comparator arm (DAB+TRAM) reconstructed from published data.
Red KM curves represent the RELATIVITY-047 IPD (NIVO+RELA).
Overlaid KM plot shows survival estimates post-MAIC adjustment.
Tables summarize patient characteristics and survival data at each step.
Arrows and labels denote the flow of data through extraction, weighting, and comparison.
2 Core Libraries for Analysis
Load Packages
2.1 Statistical & Visualization Toolkit
👇 Expand the code chunk below to view the packages necessary for this project
Show R Code
## Clear workspace for reproducibilityrm(list =ls())# Function to load/install packages (avoids clutter & repetition)load_pkg <-function(pkg) {if (!requireNamespace(pkg, quietly =TRUE)) install.packages(pkg)library(pkg, character.only =TRUE)}# Load core data manipulation and plotting packagescore_packages <-c("tidyverse", # dplyr, ggplot2, readr, etc."data.table", # Fast reading and manipulation"MASS", # Statistical models"splines", # Spline functions for survival curves"survival", # Survival analysis"survminer", # Publication-ready survival plots"ggplot2", # Visualization"scales", # Axis scaling for ggplots"grid", "gridExtra", # Grid layouts for complex plots"gt", "gtsummary", # Beautiful tables"knitr", "kableExtra", # Reporting & enhanced tables"here", # Reproducible project paths"jpeg", # Import JPEG images"sandwich", # Robust standard errors"survey", # Complex survey design / weighting"htmltools", # HTML manipulation (for scrollable tables)"shiny"# Optional: Shiny app integration)
Show R Code
# Load all core packageslapply(core_packages, load_pkg)
3 Load Data and Simulate Patient-Level Dataset
Background and Sources
3.1 Data Overview
In this section, we load summary-level baseline and safety data from four published studies:
IMspire150
COLUMBUS
COMBI-d/COMBI-v
coBRIM
These datasets are provided as CSV files in the repository and are organized under the files directory.
We also generate a simulated patient-level dataset (pseudo-IPD) for the NIVO+RELA cohort. This dataset is used for demonstration purposes and does not represent real patient data.
Show R Code
# Establish the base path to the data directorydata_path <- here::here("files")
3.2 Load Summary-Level Data
IMspire150
Show R Code
subD <-"Summary level data"trial <-"atezo+vemu+cobi"# Step 1: Load the datasetimspire150 <-read_csv(file =paste0(data_path, "/", subD, "/", trial, "/SLD_safety.csv"))# Step 2: Create the GT table with bolded titleimspire150_table <- imspire150 %>%gt() %>%tab_header(title =html("<b>Preview of IMspire150 Summary Level and Safety Data</b>") ) %>%cols_align(align ="center", columns =everything())# Step 3: Render inside scrollable HTML divhtmltools::div(style ="overflow-x: auto; height: 400px;", imspire150_table)
Preview of IMspire150 Summary Level and Safety Data
characteristics
trial
arm
mean
sd
type
n
sex_female
IMspire150 (Ascierto 2022)
Atezolizumab + vemurafenib + cobimetinib
0.41406250
NA
categorical
256
sex_male
IMspire150 (Ascierto 2022)
Atezolizumab + vemurafenib + cobimetinib
0.58593750
NA
categorical
256
race_white
IMspire150 (Ascierto 2022)
Atezolizumab + vemurafenib + cobimetinib
0.94921875
NA
categorical
256
race_race_other
IMspire150 (Ascierto 2022)
Atezolizumab + vemurafenib + cobimetinib
0.05078125
NA
categorical
256
geo_europe
IMspire150 (Ascierto 2022)
Atezolizumab + vemurafenib + cobimetinib
0.79296875
NA
categorical
256
geo_north_americas
IMspire150 (Ascierto 2022)
Atezolizumab + vemurafenib + cobimetinib
0.05078125
NA
categorical
256
geo_other
IMspire150 (Ascierto 2022)
Atezolizumab + vemurafenib + cobimetinib
0.15625000
NA
categorical
256
ecog_ecog_0
IMspire150 (Ascierto 2022)
Atezolizumab + vemurafenib + cobimetinib
0.75781250
NA
categorical
256
ecog_ecog_1
IMspire150 (Ascierto 2022)
Atezolizumab + vemurafenib + cobimetinib
0.24218750
NA
categorical
256
age_above_54
IMspire150 (Ascierto 2022)
Atezolizumab + vemurafenib + cobimetinib
0.50000000
NA
categorical
256
baseline_met_stage_m1b
IMspire150 (Ascierto 2022)
Atezolizumab + vemurafenib + cobimetinib
0.22265625
NA
categorical
256
baseline_met_stage_m01a
IMspire150 (Ascierto 2022)
Atezolizumab + vemurafenib + cobimetinib
0.21484375
NA
categorical
256
baseline_met_stage_m1c1d
IMspire150 (Ascierto 2022)
Atezolizumab + vemurafenib + cobimetinib
0.56250000
NA
categorical
256
ldh_cat_ldh_great_uln
IMspire150 (Ascierto 2022)
Atezolizumab + vemurafenib + cobimetinib
0.32812500
NA
categorical
256
ldh_cat_ldh_lower_equal_uln
IMspire150 (Ascierto 2022)
Atezolizumab + vemurafenib + cobimetinib
0.67187500
NA
categorical
256
bmhist_history_of_brain_metastases
IMspire150 (Ascierto 2022)
Atezolizumab + vemurafenib + cobimetinib
0.01953125
NA
categorical
256
bmhist_no_history_of_brain_metastases
IMspire150 (Ascierto 2022)
Atezolizumab + vemurafenib + cobimetinib
0.98046875
NA
categorical
256
immunotherapy
IMspire150 (Ascierto 2022)
Atezolizumab + vemurafenib + cobimetinib
0.16015625
NA
categorical
256
tum_burd_less_44
IMspire150 (Ascierto 2022)
Atezolizumab + vemurafenib + cobimetinib
0.42578125
NA
categorical
256
tum_burd_great_44
IMspire150 (Ascierto 2022)
Atezolizumab + vemurafenib + cobimetinib
0.57421875
NA
categorical
256
tum_burd_miss
IMspire150 (Ascierto 2022)
Atezolizumab + vemurafenib + cobimetinib
0.00000000
NA
categorical
256
ORR_investigator
IMspire150 (Ascierto 2022)
Atezolizumab + vemurafenib + cobimetinib
0.66406250
NA
categorical
256
AE_any
IMspire150 (Ascierto 2022)
Atezolizumab + vemurafenib + cobimetinib
1.00000000
NA
categorical
256
AE_g3
IMspire150 (Ascierto 2022)
Atezolizumab + vemurafenib + cobimetinib
0.74891775
NA
categorical
231
AE_disc
IMspire150 (Ascierto 2022)
Atezolizumab + vemurafenib + cobimetinib
0.38095238
NA
categorical
231
ae_rash
IMspire150 (Ascierto 2022)
Atezolizumab + vemurafenib + cobimetinib
0.45454545
NA
categorical
231
ae_diarrhoea
IMspire150 (Ascierto 2022)
Atezolizumab + vemurafenib + cobimetinib
0.50216450
NA
categorical
231
ae_fatigue
IMspire150 (Ascierto 2022)
Atezolizumab + vemurafenib + cobimetinib
0.30303030
NA
categorical
231
ae_arthralgia
IMspire150 (Ascierto 2022)
Atezolizumab + vemurafenib + cobimetinib
0.44588745
NA
categorical
231
ae_pruritus
IMspire150 (Ascierto 2022)
Atezolizumab + vemurafenib + cobimetinib
0.28138528
NA
categorical
231
COMBI-d and COMBI-v
Show R Code
subD <-"Summary level data"trial <-"dab+tram"# Step 1: Load the datasetcombiD_V <-read_csv(file =paste0(data_path, "/", subD, "/", trial, "/SLD_safety.csv"))# Step 2: Create the GT table with bolded title and centered columnscombiD_V_table <- combiD_V %>%gt() %>%tab_header(title =html("<b>Preview of COMBI-d and COMBI-v Summary Level and Safety Data</b>") ) %>%cols_align(align ="center", columns =everything())# Step 3: Render inside scrollable HTML divhtmltools::div(style ="overflow-x: auto; height: 400px;", combiD_V_table)
Preview of COMBI-d and COMBI-v Summary Level and Safety Data
We consider all baseline characteristics that were used for matching in the MAIC in our main publication.
First, we set a seed value for reproducibility. Then, we set the cohort sample size and summary statistics (in this case, proportions) for each of the baseline characteristics and for the binary outcome. The summary statistics are those from the actual RELATIVITY-047 trial. Finally, the baseline characteristics and ORR are simulated from binomial distributions based on summary statistics and randomly assigned to patients.
This creates the patient-level dataset including baseline characteristics and one binary outcome.
👇 Expand the code chunk below to view the code used to create pseudo IPD for Relativity 047
Show R Code
# Set seed for reproducibilityset.seed(42)# Summary-level parameters## Cohort sample size ##N_patients_NR <-136# NR represents Nivolumab-Relatlimab treated patients from Relativity 47 with BRAF mutations## Baseline characteristics ##age_above_55_NR <-0.404female_pc_NR <-0.485# pc represents Proportion of the Cohortecog_0_pc_NR <-0.706# 70.6% of patients had ECOG 0ldh_great_uln_pc_NR <-0.338#33.8% of patients had LD> ULNdisease_sites_above3_pc_NR <-0.419# 41.9% of patients had 3 or more sites of diseasemets_M0_M1a_pc_NR <-0.309# 30.9% of patients had M0 or M1a diseasemets_M1b_pc_NR <-0.25# 25% of patients had M1b diseaseimmunotherapy_NR <-0.118# 11.8% of patients had prior immunotherapy## Outcomes ##### Binary outcomes (ORR) ##ORR_investigator_NR <-0.480184# this value was estimated, not taken from data## Create patient-level dataset including baseline characteristics and ORR ##NR.IPD <-data.frame(# Sample size n = N_patients_NR, # Patient IDsID =seq(1, N_patients_NR, by =1), # Baseline characteristicsage_above_55 =rbinom(N_patients_NR, 1, age_above_55_NR), gender_female =rbinom(N_patients_NR, 1, female_pc_NR), ecog_0 =rbinom(N_patients_NR, 1, ecog_0_pc_NR), ldh_great_uln =rbinom(N_patients_NR, 1, ldh_great_uln_pc_NR), disease_sites_above3 =rbinom(N_patients_NR, 1, disease_sites_above3_pc_NR), mets_M0_M1a =rbinom(N_patients_NR, 1, mets_M0_M1a_pc_NR), immunotherapy =rbinom(N_patients_NR, 1, immunotherapy_NR),# Binary outcomesORR_investigator =rbinom(N_patients_NR, 1, ORR_investigator_NR),# Treatmenttrt ="NIVO+RELA")
A tabular form of the synthetic dataset is seen below.
Show R Code
# Define a named vector mapping variable names to readable labelsvar_labels <-c(age_above_55 ="Age > 55",gender_female ="Female",ecog_0 ="ECOG Performance Status 0",ldh_great_uln ="LDH > Upper Limit of Normal",disease_sites_above3 ="≥3 Disease Sites",mets_M0_M1a ="M0/M1a Metastases",immunotherapy ="Prior Immunotherapy",ORR_investigator ="Objective Response Rate (Investigator-Assessed)")# Rename columns using `rename_with()`NR.IPD_readable <- NR.IPD %>%rename_with(~sapply(.x, function(col) ifelse(col %in%names(var_labels), var_labels[[col]], col)), everything())# Step 1: Create the GT tableipd_table <- NR.IPD_readable %>%gt() %>%tab_header(title =html("<b>Simulated Patient-Level Data for NIVO+RELA</b>")) %>%cols_align(align ="center", columns =everything())# Step 2: Render the table inside a scrollable HTML divhtmltools::div(style ="overflow-x: auto; height: 400px;", ipd_table)
Simulated Patient-Level Data for NIVO+RELA
n
ID
Age > 55
Female
ECOG Performance Status 0
LDH > Upper Limit of Normal
≥3 Disease Sites
M0/M1a Metastases
Prior Immunotherapy
Objective Response Rate (Investigator-Assessed)
trt
136
1
1
1
1
0
0
0
1
0
NIVO+RELA
136
2
1
0
1
0
1
1
0
1
NIVO+RELA
136
3
0
1
0
0
1
1
0
0
NIVO+RELA
136
4
1
1
1
0
0
1
0
1
NIVO+RELA
136
5
1
0
1
0
0
0
0
1
NIVO+RELA
136
6
0
0
1
1
0
1
0
1
NIVO+RELA
136
7
1
0
1
0
1
0
0
1
NIVO+RELA
136
8
0
1
1
0
0
0
0
1
NIVO+RELA
136
9
1
1
1
0
1
1
0
0
NIVO+RELA
136
10
1
1
1
1
0
0
1
1
NIVO+RELA
136
11
0
0
1
0
0
0
0
0
NIVO+RELA
136
12
1
1
0
0
0
0
0
1
NIVO+RELA
136
13
1
0
1
0
1
1
0
0
NIVO+RELA
136
14
0
0
1
0
0
1
0
1
NIVO+RELA
136
15
0
1
1
1
1
0
0
0
NIVO+RELA
136
16
1
0
1
1
1
1
0
0
NIVO+RELA
136
17
1
1
1
0
1
0
0
1
NIVO+RELA
136
18
0
0
1
0
0
1
0
0
NIVO+RELA
136
19
0
1
0
0
0
1
0
0
NIVO+RELA
136
20
0
1
0
0
0
0
0
0
NIVO+RELA
136
21
1
0
1
1
0
1
0
1
NIVO+RELA
136
22
0
0
1
0
0
0
0
1
NIVO+RELA
136
23
1
0
1
0
1
0
1
0
NIVO+RELA
136
24
1
1
1
0
0
1
0
0
NIVO+RELA
136
25
0
1
0
0
0
1
0
0
NIVO+RELA
136
26
0
1
1
0
1
1
0
0
NIVO+RELA
136
27
0
1
1
1
1
0
0
1
NIVO+RELA
136
28
1
0
1
0
0
0
0
0
NIVO+RELA
136
29
0
1
1
0
1
0
0
0
NIVO+RELA
136
30
1
0
1
0
0
0
0
0
NIVO+RELA
136
31
1
0
1
0
0
1
0
0
NIVO+RELA
136
32
1
0
1
0
1
1
0
1
NIVO+RELA
136
33
0
0
0
0
0
0
0
0
NIVO+RELA
136
34
1
0
0
1
0
1
0
0
NIVO+RELA
136
35
0
1
1
0
1
1
0
1
NIVO+RELA
136
36
1
0
1
1
0
1
0
0
NIVO+RELA
136
37
0
0
1
0
0
0
0
0
NIVO+RELA
136
38
0
0
1
0
1
0
0
1
NIVO+RELA
136
39
1
0
1
0
1
0
0
1
NIVO+RELA
136
40
1
0
0
1
1
0
0
0
NIVO+RELA
136
41
0
1
1
1
0
0
0
1
NIVO+RELA
136
42
0
1
1
0
0
0
1
1
NIVO+RELA
136
43
0
1
1
0
1
0
0
0
NIVO+RELA
136
44
1
1
0
0
1
0
1
0
NIVO+RELA
136
45
0
1
1
1
0
0
1
1
NIVO+RELA
136
46
1
1
1
0
1
1
0
0
NIVO+RELA
136
47
1
0
0
0
0
0
0
0
NIVO+RELA
136
48
1
0
0
0
1
0
0
0
NIVO+RELA
136
49
1
1
1
0
0
0
0
1
NIVO+RELA
136
50
1
1
1
1
1
0
0
0
NIVO+RELA
136
51
0
1
1
1
1
0
0
1
NIVO+RELA
136
52
0
1
1
1
0
1
0
1
NIVO+RELA
136
53
0
0
1
0
0
1
0
0
NIVO+RELA
136
54
1
0
0
0
0
1
0
0
NIVO+RELA
136
55
0
0
1
1
0
0
0
1
NIVO+RELA
136
56
1
1
0
0
0
0
0
1
NIVO+RELA
136
57
1
0
0
0
0
0
0
0
NIVO+RELA
136
58
0
0
0
0
0
0
0
1
NIVO+RELA
136
59
0
0
0
0
1
0
0
0
NIVO+RELA
136
60
0
0
1
0
0
1
0
1
NIVO+RELA
136
61
1
1
1
0
0
1
0
0
NIVO+RELA
136
62
1
0
1
1
0
0
0
0
NIVO+RELA
136
63
1
1
1
0
1
0
1
0
NIVO+RELA
136
64
0
1
1
0
0
1
0
1
NIVO+RELA
136
65
1
1
1
1
0
1
1
0
NIVO+RELA
136
66
0
1
1
1
0
0
0
0
NIVO+RELA
136
67
0
1
0
0
0
0
0
1
NIVO+RELA
136
68
1
0
1
0
0
1
0
0
NIVO+RELA
136
69
1
0
1
0
1
1
1
1
NIVO+RELA
136
70
0
0
1
1
0
0
0
0
NIVO+RELA
136
71
0
1
0
0
1
0
0
0
NIVO+RELA
136
72
0
0
1
1
1
0
0
0
NIVO+RELA
136
73
0
0
1
0
0
0
0
0
NIVO+RELA
136
74
0
0
1
0
0
0
0
0
NIVO+RELA
136
75
0
1
0
1
0
0
0
0
NIVO+RELA
136
76
1
1
1
0
0
0
0
0
NIVO+RELA
136
77
0
0
0
1
0
0
0
1
NIVO+RELA
136
78
0
0
0
0
1
0
0
1
NIVO+RELA
136
79
0
0
1
1
0
1
0
1
NIVO+RELA
136
80
0
1
1
0
1
1
1
0
NIVO+RELA
136
81
0
1
1
0
0
1
0
0
NIVO+RELA
136
82
0
0
1
0
1
1
0
0
NIVO+RELA
136
83
0
1
0
1
0
0
0
0
NIVO+RELA
136
84
1
1
1
0
0
0
0
1
NIVO+RELA
136
85
1
0
1
1
1
0
0
1
NIVO+RELA
136
86
0
0
0
0
0
0
0
1
NIVO+RELA
136
87
0
1
1
0
0
1
0
0
NIVO+RELA
136
88
0
1
1
0
0
0
0
0
NIVO+RELA
136
89
0
0
1
1
0
0
0
1
NIVO+RELA
136
90
0
1
1
0
0
0
1
0
NIVO+RELA
136
91
1
0
1
0
0
0
0
0
NIVO+RELA
136
92
0
0
0
0
1
0
0
1
NIVO+RELA
136
93
0
0
1
0
0
0
0
1
NIVO+RELA
136
94
1
0
1
0
0
0
0
0
NIVO+RELA
136
95
1
0
1
0
1
1
0
1
NIVO+RELA
136
96
1
0
0
1
1
0
0
1
NIVO+RELA
136
97
0
0
1
0
0
1
1
1
NIVO+RELA
136
98
0
0
1
1
0
1
1
0
NIVO+RELA
136
99
1
1
1
0
0
1
0
1
NIVO+RELA
136
100
1
1
1
0
1
1
0
0
NIVO+RELA
136
101
1
0
0
1
0
0
1
0
NIVO+RELA
136
102
0
1
1
0
1
0
0
1
NIVO+RELA
136
103
0
1
0
0
0
0
0
0
NIVO+RELA
136
104
0
0
0
1
0
0
0
1
NIVO+RELA
136
105
1
0
1
0
1
0
0
1
NIVO+RELA
136
106
1
1
1
0
1
0
1
1
NIVO+RELA
136
107
1
0
1
0
0
0
1
0
NIVO+RELA
136
108
1
0
0
0
1
0
0
0
NIVO+RELA
136
109
0
1
1
0
1
0
0
0
NIVO+RELA
136
110
0
0
1
0
1
0
0
1
NIVO+RELA
136
111
1
0
1
0
0
1
0
0
NIVO+RELA
136
112
1
0
1
0
0
0
0
0
NIVO+RELA
136
113
1
0
1
1
0
0
0
0
NIVO+RELA
136
114
0
1
0
1
0
0
0
1
NIVO+RELA
136
115
0
0
1
1
0
0
0
1
NIVO+RELA
136
116
0
0
0
0
0
0
1
0
NIVO+RELA
136
117
0
0
1
0
1
0
1
0
NIVO+RELA
136
118
0
1
1
1
1
0
1
1
NIVO+RELA
136
119
1
0
0
0
0
1
0
1
NIVO+RELA
136
120
1
1
1
1
1
0
0
0
NIVO+RELA
136
121
0
0
1
0
0
1
0
0
NIVO+RELA
136
122
0
1
1
0
1
1
0
1
NIVO+RELA
136
123
0
1
0
1
0
0
0
0
NIVO+RELA
136
124
0
1
1
0
0
0
0
1
NIVO+RELA
136
125
1
1
0
1
1
0
0
1
NIVO+RELA
136
126
0
0
0
0
0
0
0
0
NIVO+RELA
136
127
1
0
1
0
0
0
0
0
NIVO+RELA
136
128
1
1
1
0
1
0
0
0
NIVO+RELA
136
129
0
0
1
0
0
0
0
0
NIVO+RELA
136
130
1
0
1
0
0
0
0
1
NIVO+RELA
136
131
1
1
1
0
0
0
0
0
NIVO+RELA
136
132
1
0
1
0
0
1
0
1
NIVO+RELA
136
133
1
1
0
1
0
0
0
1
NIVO+RELA
136
134
1
1
1
1
0
0
0
0
NIVO+RELA
136
135
1
0
0
0
0
1
0
1
NIVO+RELA
136
136
0
1
1
0
0
0
0
1
NIVO+RELA
As shown in the summary table below, the synthetic data aligns with the IPD from Table 1 in the manuscript but exhibits some differences due to the random generation process used in the code above. As a result, it serves as pseudo-IPD for this worked example.
Show R Code
NR.IPD_readable %>%select(-ID, -trt) %>%# Exclude ID and treatment columntbl_summary(statistic =all_categorical() ~"{n} ({p}%)", # Ensures n (%) formatmissing ="no" ) %>%# add_n() %>% # Adds "N = 136" at the top but removes redundant row belowmodify_header(label ="**Characteristic**") %>%modify_spanning_header(all_stat_cols() ~"**NIVO+RELA**") %>%modify_footnote(update =everything() ~"n (%)") %>%# Ensure proper footnotebold_labels() %>%modify_table_body(~ .x %>%filter(variable !="n")) # Removes the redundant first two rows
Characteristic1
NIVO+RELA
N = 1361
Age > 55
65 (48%)
Female
62 (46%)
ECOG Performance Status 0
99 (73%)
LDH > Upper Limit of Normal
39 (29%)
≥3 Disease Sites
48 (35%)
M0/M1a Metastases
45 (33%)
Prior Immunotherapy
19 (14%)
Objective Response Rate (Investigator-Assessed)
60 (44%)
1 n (%)
4 Reconstruction of IPD from Published Kaplan-Meier Curves
Algorithm for Reconstruction
4.1 Overview
Background and Goals
This section presents a reproducible R-based implementation of the method developed by Guyot et al. (2012) for reconstructing individual patient-level data from published Kaplan-Meier survival curves.
This reconstruction is essential for indirect comparisons—including MAIC or network meta-analyses—when patient-level data are not publicly available.
The approach requires key inputs:
Published Kaplan-Meier Curve Image
An image or PDF of the KM curve from the original publication.
Digitized Kaplan-Meier Curve Data
Extracted (x, y) coordinates from the survival curve using a digitization tool, such as WebPlotDigitizer.
Format: CSV with two columns:
x: Time points along the x-axis (e.g., months)
Curve1: Corresponding survival probabilities.
Number-at-Risk Table
Extracted counts of patients at risk at key time points.
Format: CSV with the following columns:
Time: Time points where the number at risk is reported
Nrisk: Number of patients at risk
Median Survival, CI, Number of Events, and Unit (as reported)
We then apply the reconstruction algorithm to generate individual-level event data suitable for survival analysis.
Reference:
Guyot P, Ades AE, Ouwens MJ, Welton NJ. Enhanced secondary analysis of survival data: reconstructing the data from published Kaplan-Meier survival curves. BMC Medical Research Methodology. 2012;12:9. doi:10.1186/1471-2288-12-9
4.2 Visual Reference: Published Kaplan-Meier Curve (Figure S1)
Source Figure Reference
For this example, we use Figure S1 from the supplementary material of Dummer et al., 2018 (Lancet Oncology). This figure shows Progression-Free Survival (PFS) curves comparing COMBO450 to VEM or ENCO300 arms.
This figure is the source for our digitization process. We use WebPlotDigitizer to extract the data points, followed by reconstruction using the Guyot method.
Reference:
Dummer R, et al. Encorafenib plus binimetinib versus vemurafenib or encorafenib in patients with BRAF-mutant melanoma (COLUMBUS): a multicentre, open-label, randomised phase 3 trial. Lancet Oncology. 2018;19(5):603-615. doi:10.1016/S1470-2045(18)30142-6
4.3 Load Digitized Kaplan-Meier Curve and Number-at-Risk Data
Data Extraction
The reconstruction algorithm requires two input files in .csv format:
Digitized Kaplan-Meier Curve Data — containing time (x) and survival probability (y) points extracted from the published curve (see above).
Number-at-Risk Table — containing the number of patients at risk at specific time points, extracted directly from the publication.
Both files are read using data.table::fread() for efficiency.
Show R Code
# Define file paths using here::here() for portability# Read digitized KM curve data and number at risk tabledata_path <- here::here("files/ENCO+BINI PFS Digitization Example") # Change to accomodate the path of your fileskm_file <-file.path(data_path, "PFS_Dummer2018.csv")file.exists(km_file)
We first inspect the digitized Kaplan-Meier curve data (digC) and the number-at-risk table (nRisk) to confirm successful data loading and correct formatting.
# Load number-at-risk data and remove rows where Nrisk is zero (done for analysis purposes)nRisk <-fread(nrisk_file) %>%filter(Nrisk !=0)
Note: These values should align with the number-at-risk figures presented at the bottom of the Kaplan-Meier plot (Figure S1).
4.4 Reconstruction of IPD
Algorithm for Reconstruction
4.4.1 Overview of Reconstruction Algorithm
Overview of Algorithm
To recreate individual patient-level data from the digitized Kaplan-Meier curve, we implemented the algorithm from Guyot et al., 2012. This method enables survival analysis when only aggregate survival curves and number-at-risk tables are available.
4.4.1.1 High-Level Workflow and Function Roles
The reconstruction process is modular, with each function handling a specific step:
Align Number-at-Risk Data to KM Curve — Nrisk_low_up()
Matches the number-at-risk time points to the digitized survival curve
Ensures proper interval mapping for event/censor distribution
Core Reconstruction Logic — reconKMGuyot()
Iteratively estimates the number of events and censoring within each interval
Reconstructs synthetic event and censoring times for individual patients
Outputs a patient-level dataset ready for survival analysis
Flexible Wrapper for Reconstruction — FUN_KM_RECON()
Coordinates the overall process
Handles input flexibility (with or without number-at-risk table)
Calls Nrisk_low_up() and reconKMGuyot() in sequence
Returns the reconstructed IPD
Pipeline Execution — process_KM_data()
Streamlines function calls
Prepares the dataset for downstream survival analysis
4.4.1.2 Summary
Together, these functions produce a synthetic individual patient dataset containing event times and status (event or censor). This dataset can then be used for survival modeling, matching-adjusted indirect comparisons, or other analyses requiring patient-level data.
This function aligns number-at-risk time points with the digitized survival curve, ensuring accurate distribution of events and censored observations.
👇 Expand the code chunk below to view the function definition:
Show R Code
Nrisk_low_up <-function( t.risk0, # Vector of original time points from the number-at-risk table t.S1, # Vector of time points from the digitized survival (KM) curve n.risk0 # Vector of number-at-risk counts at each t.risk0 time point) {# Iterate over each number-at-risk interval (excluding first and last) for (i in2:(length(t.risk0) -1)) {# Identify survival curve time points (t.S1) that fall within the current at-risk interval temp <-which((t.risk0[i] <= t.S1) & (t.S1 < t.risk0[i +1]))# If no points fall within this interval, mark the at-risk time and count as invalid (-1)if (length(temp) ==0) { t.risk0[i] <--1 n.risk0[i] <--1print(paste0("No events/points found between ", t.risk0[i], " and ", t.risk0[i +1])) } }# Remove any invalid (-1) time points and counts t.risk1 <- t.risk0[t.risk0 !=-1] n.risk1 <- n.risk0[n.risk0 !=-1]# For each valid at-risk interval, find the corresponding indices (lower and upper) in t.S1 lower1 <-sapply(1:(length(t.risk1) -1), function(i) min(which((t.risk1[i] <= t.S1) & (t.S1 < t.risk1[i +1])))) upper1 <-sapply(1:(length(t.risk1) -1), function(i) max(which((t.risk1[i] <= t.S1) & (t.S1 < t.risk1[i +1]))))# Handle the last interval separately to ensure coverage to the end of t.S1 lower1 <-c(lower1, min(which(t.risk1[length(t.risk1)] <= t.S1))) upper1 <-c(upper1, max(which(t.risk1[length(t.risk1)] <= t.S1)))# Return a list of cleaned and matched at-risk times and their index mappingslist("lower1"= lower1, # Lower index in t.S1 for each at-risk interval"upper1"= upper1, # Upper index in t.S1 for each at-risk interval"t.risk1"= t.risk1, # Cleaned vector of valid at-risk time points"n.risk1"= n.risk1 # Cleaned vector of valid number-at-risk counts )}
4.4.3 Core Reconstruction Function (reconKMGuyot())
Core Reconstruction Function: reconKMGuyot()
The reconKMGuyot() function performs the core survival reconstruction.
Purpose:
Iterates through the number-at-risk intervals
Estimates the number of events and censoring between each time point
Reconstructs individual patient event and censor times
Outputs a synthetic individual patient dataset ready for survival analysis
This is a direct implementation of the algorithm described by Guyot et al. (2012).
👇 Expand the code chunk below to view the function definition:
Show R Code
# Main reconstruction function based on Guyot et al. algorithmreconKMGuyot <-function( tot.events, # Total number of events reported in the study t.S, # Vector of time points from the digitized KM curve S, # Vector of survival probabilities corresponding to t.S t.risk, # Vector of time points where number-at-risk is reported lower, # Vector of lower indices mapping number-at-risk intervals to t.S upper, # Vector of upper indices mapping number-at-risk intervals to t.S n.risk, # Vector of number-at-risk values at each t.risk time point tol # Tolerance value (typically for convergence, not directly used here)) {# Number of risk intervals n.int <-length(n.risk)# Total number of survival time points n.t <- upper[n.int]# Initialize vectors to store:# Estimated number of censorings per interval n.censor <-rep(0, n.int -1)# Estimated number at risk at each time point n.hat <-rep(n.risk[1] +1, n.t)# Number of censorings at each time point cen <-rep(0, n.t)# Number of events at each time point d <-rep(0, n.t)# Estimated Kaplan-Meier survival probability at each time point KM.hat <-rep(1, n.t)# Track the last time point with an event last.i <-rep(1, n.int)# Sum of events (for book-keeping, not used here) sumdL <-0# Start looping through each risk interval to estimate events and censoringif (n.int >1) {for (i in1:(n.int -1)) {# Estimate number of censored patients between interval i and i+1 n.censor[i] <-round(n.risk[i] * S[lower[i +1]] / S[lower[i]] - n.risk[i +1])# Adjust estimated number of censorings iteratively until number at risk alignswhile ((n.hat[lower[i +1]] > n.risk[i +1]) || ((n.hat[lower[i +1]] < n.risk[i +1]) && (n.censor[i] >0))) {if (n.censor[i] <=0) {# If no censoring, set all censoring counts in this interval to zero cen[lower[i]:upper[i]] <-0 n.censor[i] <-0 } else {# Distribute censoring events uniformly across the interval cen.t <- t.S[lower[i]] + (1:n.censor[i]) * (t.S[lower[i +1]] - t.S[lower[i]]) / (n.censor[i] +1)# Bin the censor times into the respective survival time intervals cen[lower[i]:upper[i]] <-hist(cen.t, breaks = t.S[lower[i]:lower[i +1]], plot =FALSE)$counts }# Set starting number at risk for this interval n.hat[lower[i]] <- n.risk[i]# Initialize the last time point with event last <- last.i[i]# Iterate through each time point in this intervalfor (k in lower[i]:upper[i]) {# Estimate events at time k using conditional survival probability d[k] <-if (i ==1&& k == lower[i]) 0elseround(n.hat[k] * (1- (S[k] / KM.hat[last])))# Update KM estimate based on new events KM.hat[k] <- KM.hat[last] * (1- (d[k] / n.hat[k]))# Update number at risk for next time point n.hat[k +1] <- n.hat[k] - d[k] - cen[k]# Update last event time if an event occurred at kif (d[k] !=0) last <- k }# Adjust censoring based on updated number at risk n.censor[i] <- n.censor[i] + (n.hat[lower[i +1]] - n.risk[i +1]) }# Ensure that the estimated number at risk aligns with the known numberif (n.hat[lower[i +1]] < n.risk[i +1]) n.risk[i +1] <- n.hat[lower[i +1]]# Store last event time point for next interval last.i[i +1] <- last } }# -------- Reconstruct Individual Patient Data (IPD) ----------# Initialize IPD time vector (default all to last time point) t.IPD <-rep(t.S[n.t], n.risk[1])# Initialize event indicator vector (0 = censored by default) event.IPD <-rep(0, n.risk[1])# Tracker for indexing patients k <-1# Assign event times to the patientsfor (j in1:n.t) {if (d[j] !=0) { t.IPD[k:(k + d[j] -1)] <-rep(t.S[j], d[j]) # Assign event time event.IPD[k:(k + d[j] -1)] <-rep(1, d[j]) # Mark as event k <- k + d[j] # Move the index forward } }# Assign censoring times (placed between survival times)for (j in1:(n.t -1)) {if (cen[j] !=0) {# Censoring time is placed mid-interval t.IPD[k:(k + cen[j] -1)] <-rep((t.S[j] + t.S[j +1]) /2, cen[j]) event.IPD[k:(k + cen[j] -1)] <-rep(0, cen[j]) # Mark as censored k <- k + cen[j] } }# Create the final IPD dataframe IPD <-data.frame(t = t.IPD, ev = event.IPD)# Final adjustment to ensure the last time aligns with survival curve maxif (IPD[nrow(IPD), "t"] < t.S[length(t.S)]) { IPD[nrow(IPD), "t"] <- t.S[length(t.S)] }# Return both the reconstructed event/censoring table and the IPDreturn(list(dat =cbind(t.S, S, n.hat[1:n.t], d, cen), ipd = IPD))}
4.4.4 Wrapper Function (FUN_KM_RECON())
Wrapper Function: FUN_KM_RECON()
The FUN_KM_RECON() function wraps the full reconstruction process. It:
Prepares the digitized survival data (rawkmfile)
Ensures the time = 0 anchor is present
Aligns the number-at-risk table
Calls reconKMGuyot() to generate the individual patient data (IPD)
This function provides flexibility to:
Run the reconstruction with or without number-at-risk data
Use a known total event count (totev) or just the total patient count (totp) if number-at-risk is missing
👇 Expand the code chunk below to view the function definition:
Show R Code
FUN_KM_RECON <-function( rawkmfile, # Digitized Kaplan-Meier survival data (time and probability) rawnriskfile, # Number-at-risk table with time and counts (can be NULL) totev, # Total number of events reported for the studytotp =0# Total number of patients (used if number-at-risk table is missing)) {# Rename the digitized KM data columns for clarity: 't' for time and 'p' for survival probabilitycolnames(rawkmfile) <-c("t", "p")# Ensure that time zero is included in the KM data (this anchors the curve at survival probability = 1)if (sum(rawkmfile$t ==0) ==0) {print("Time zero not present in KM data. Adding time = 0 and survival = 1.") new_row <-data.table(t =0, p =1) # If missing, create a new row at time zero with survival = 1 rawkmfile <-rbind(new_row, rawkmfile) # Add it to the top of the KM dataset } else {print("Time zero found in KM data. Setting survival at time zero to 1.") rawkmfile$p[which(rawkmfile$t ==0)] <-1# If time zero exists, make sure survival probability is set to 1 }# Check if the number-at-risk table is providedif (!is.null(rawnriskfile)) {print("Number-at-risk data provided. Proceeding with alignment and full reconstruction.")# Align the number-at-risk data to the KM data using the helper function nrisk_bounds <-Nrisk_low_up(t.risk0 = rawnriskfile$Time, # Times from the number-at-risk tablet.S1 = rawkmfile$t, # Times from the digitized KM curven.risk0 = rawnriskfile$Nrisk # Number-at-risk counts )# Perform the core reconstruction of patient-level data ipd_recon <-reconKMGuyot(tot.events = totev, # Total events in the trialt.S = rawkmfile$t, # KM curve time pointsS = rawkmfile$p, # KM survival probabilitiest.risk = nrisk_bounds$t.risk1, # Cleaned number-at-risk timeslower = nrisk_bounds$lower1, # Lower bounds of survival time mappingupper = nrisk_bounds$upper1, # Upper bounds of survival time mappingn.risk = nrisk_bounds$n.risk1, # Cleaned number-at-risk countstol =0.01# Tolerance for convergence in reconstruction )print("Reconstruction using number-at-risk table completed.")# Return the reconstructed dataset as a list (including survival curve, at-risk table, and IPD)return(list(surv = rawkmfile, # The processed KM datanrisk = rawnriskfile, # Original number-at-risk tableIPD = ipd_recon$ipd # Reconstructed individual patient data (IPD) )) } else {# If no number-at-risk table is provided, reconstruct using only total population countprint("Number-at-risk data NOT provided. Proceeding with total population count only.") ipd_recon <-reconKMGuyot(tot.events = totev, # Total eventst.S = rawkmfile$t, # KM time pointsS = rawkmfile$p, # KM survival probabilitiest.risk =0, # Placeholder since n.risk is not used herelower =1, # Starting index for time mappingupper =length(rawkmfile$t), # Ending indexn.risk = totp, # Total number of patients (if no risk table)tol =0.01# Tolerance for reconstruction )print("Reconstruction using total patient count completed.")# Return the reconstructed dataset (same structure)return(list(surv = rawkmfile,nrisk = rawnriskfile,IPD = ipd_recon$ipd )) }}
4.4.5 Pipeline Function (process_KM_data())
Pipeline Function: process_KM_data()
The process_KM_data() function provides a clean interface to run the reconstruction workflow.
It:
Accepts the digitized KM data (digC) and the number-at-risk table (nRisk)
Passes these along with the reported number of events to FUN_KM_RECON()
Returns the reconstructed individual patient-level dataset (IPD)
This function simplifies execution and keeps the main script readable.
👇 Expand the code chunk below to view the function definition:
Show R Code
process_KM_data <-function( digC, # Digitized Kaplan-Meier curve data (data frame with time and survival probability) nRisk # Number-at-risk table (data frame with time points, at-risk counts, and event count)) {# Reconstruct individual patient data (IPD) using the wrapper function FUN_KM_RECON IPD_result <-FUN_KM_RECON(rawkmfile = digC, # Pass digitized KM datarawnriskfile = nRisk, # Pass number-at-risk tabletotev = nRisk$`Number of Events`[1] # Use the total event count from the first row of nRisk table )# Return the reconstructed IPD as a list (only the individual patient data portion)return(list(IPD = IPD_result$IPD))}
4.4.6 Execute Reconstruction and Preview IPD
Execution: Reconstruct the IPD
We now execute the reconstruction process by applying the pipeline function process_KM_data() to the digitized Kaplan-Meier curve data (digC) and the number-at-risk table (nRisk).
The output is a dataset where each row represents a synthetic individual patient with their corresponding event time and event status:
1 = event occurred
0 = censored observation
👇 Expand the code chunk below to view the function definition:
Show R Code
# Run the reconstruction pipelineresult <-process_KM_data(digC, nRisk)
[1] "Time zero found in KM data. Setting survival at time zero to 1."
[1] "Number-at-risk data provided. Proceeding with alignment and full reconstruction."
[1] "Reconstruction using number-at-risk table completed."
4.5 Quality Control and Manual Inspection of Reconstructed Data
Validation Proces
While the reconstruction algorithm provides a powerful tool to recreate individual patient data from published Kaplan-Meier curves, manual review and quality control are critical to ensure the reconstructed data aligns with the published study’s reported metrics.
In practice, reconstructed outputs may not perfectly replicate published numbers-at-risk or median survival estimates due to limitations in the digitization process or algorithm assumptions. Manual adjustments help reconcile these discrepancies and improve fidelity.
4.5.1 Examples of Manual Modifications
In this example, we compared the reconstructed IPD output to the publication and made specific adjustments to better match the reported numbers-at-risk and median progression-free survival. Details of these modifications are documented in the file “Reconstructed_IPD_reconciliation.xlsx” (provided for reference, not used in analyses).
🔧 Adjustments to Censoring and Numbers-at-Risk
Month 8: The reconstructed number-at-risk exceeded the published value by 1. We corrected this by censoring one additional patient before 8 months (adjusted censoring from 8.04 to 7.9 months).
Month 16: The reconstructed number-at-risk exceeded the publication by 4. We shifted four censoring events that originally occurred between 16–20 months to times before 16 months.
🔧 Adjustments to Align Median PFS
The reported median PFS was 14.8 months. Our initial reconstruction estimated 14.9 months.
To align with the published value, we adjusted two event times from 14.84 and 14.91 months to occur precisely at 14.8 months.
4.5.2 Load the Modified Reconstructed Dataset (Post-QC)
Warning
Manual adjustments were made to improve alignment with the published data. These are documented in Reconstructed_IPD_reconciliation.xlsx.
The final version of the reconstructed IPD dataset—incorporating these manual adjustments—is loaded below for transparency:
Using the reconstructed and modified individual patient data, we fit a Kaplan-Meier survival curve and plot it below. This provides a direct comparison to the original published Figure S1.
Show R Code
# Create a survival objectsurv_obj <-Surv(time = modified_reconstructed_ipd$t, event = modified_reconstructed_ipd$ev)# Fit the Kaplan-Meier modelkm_fit <-survfit(surv_obj ~1, data = modified_reconstructed_ipd)# Plot the survival curve with the risk tableggsurvplot( km_fit,data = reconstructed_ipd,risk.table =TRUE, # Show number at risk tablerisk.table.y.text =TRUE, # Add labels to risk table rowsrisk.table.height =0.2, # Adjust risk table sizebreak.time.by =4, # X-axis breaks every 4 monthsxlab ="Time (Months)",ylab ="Survival Probability",surv.median.line ="hv", # Add horizontal/vertical line at median survivalggtheme =theme_minimal(),legend ="none")
Show R Code
# Extract and print median survival timemedian_time <-summary(km_fit)$table["median"]cat("**Reconstructed Median PFS:**", round(median_time, 2), "months\n")
**Reconstructed Median PFS:** 14.8 months
5 Worked MAIC Example
Worked MAIC Example
5.1 Overview and Objectives
Overview
Having completed the reconstruction and validation of the pseudo-IPD from the published Kaplan-Meier curve, we now shift to the core application of these data: conducting a Matching-Adjusted Indirect Comparison.
The following worked example demonstrates, step-by-step, how to:
✅ Simulate time-to-event outcomes for the NIVO+RELA pseudo-cohort
✅ Estimate patient-level weights to adjust for baseline differences
✅ Compare outcomes against the comparator BRAF/MEK inhibitor cohort (DAB+TRAM)
This section mirrors the analytical framework used in our study and provides fully reproducible R code.
5.2 Simulate Time-To-Event Outcomes for Relativity 047 IPD
Simulation Approach and Rationale
Next, we simulate the time-to-event outcome. Unlike the comparator datasets, where individual-level survival data were reconstructed from published Kaplan-Meier curves, we cannot access the actual patient-level data from the RELATIVITY-047 trial. Due to confidentiality and patient identification concerns, the dataset is not publicly available.
Additionally, digitizing the published RELATIVITY-047 survival curves is not feasible for this example because our analysis focuses specifically on the BRAF-mutant patient subset, which is not separately plotted or reported in the published Kaplan-Meier figures.
As a result, for the purpose of this worked example, we simulate the time-to-event outcome for the NIVO+RELA cohort. The simulated data serve as a stand-in for the actual patient-level data and allow us to demonstrate the MAIC process. Time-to-event outcomes are represented by two variables: a numeric time to event or censoring, and a binary event indicator. For simplicity, we use an exponential distribution to generate the event times.
Show R Code
### Time-to-event outcomes (OS) ### Simulating patient-level data for OS# Define hazard rate lambda (using a high survival time to avoid reaching the median)assumed_os <-60# Assumes a high survival time (in months) to ensure that the median survival time is not reached for most patientslambda <-log(2) / assumed_os # This defines the hazard rate for OS, assuming exponential survival; here we have a low event rate# Generate event times from an exponential distributionevent_time <-rexp(N_patients_NR, rate = lambda) # Draws survival times from an exponential distribution using the previously defined hazard rate. This generates random survival durations for each patient.# Generate censoring times from an exponential distribution with a smaller ratelambda_censor <- lambda /20# The censoring hazard rate is 20 times smaller than the event hazard rate. This ensures more censoring than actual events.censor_time <-rexp(N_patients_NR, rate = lambda_censor) # simulates censoring times.# Ensure maximum time does not exceed 63.5event_time[event_time >63.5] <-63.5# Caps survival and censoring times at 63.5 months to avoid extreme outliers and ensure realistic follow-up durations.censor_time[censor_time >63.5] <-63.5# Ensure exactly 58 events by sorting times and censoring the restobserved_time <-pmin(event_time, censor_time) # Observed time is the minimum of event or censoring time (pmin(event_time, censor_time)).event_indicator <-ifelse(rank(observed_time) <=58, 1, 0) # 1 = event, 0 = censored# Create patient-level OS datasetpatient_data <-data.frame(ID =001:N_patients_NR,OS_Time_Months = observed_time,OS_Event = event_indicator # 1 = event, 0 = censored)# Check final event count (for visualization purposes)cat("Final Event Count:", sum(patient_data$OS_Event), "\n") # Should be 58
## Merge time-to-event outcome with rest of IPD ##NR.IPD <- NR.IPD %>%mutate(ID =as.numeric(ID)) %>%left_join(patient_data)
5.3 Load Comparator Pseudo-IPD: DAB+TRAM
Aggregate Dataset Setup: DAB+TRAM Cohort
For the DAB+TRAM cohort, we rely on summary statistics for baseline characteristics and binary outcomes reported in the trial publications. The time-to-event data (overall survival) have already been reconstructed using the Kaplan-Meier curve digitization and reconstruction approach described earlier in this document.
Specifically, the individual patient-level survival data for DAB+TRAM were generated from the published Kaplan-Meier curves using the Guyot et al. (2012) algorithm. This reconstruction process transformed the digitized survival probabilities and number-at-risk tables into a synthetic dataset approximating the original trial results.
For this worked example, we load the resulting pseudo-IPD directly from a CSV file containing the reconstructed event times and censoring indicators for each individual.
Show R Code
# Summary level data for comparator, DAB+TRAMN_DT <-563# Total number of patients## Baseline characteristics (frequencies) ##age_above_55_DT <-0.5female_pc_DT <-0.43339254ecog_0_pc_DT <-0.715808ldh_great_uln_pc_DT <-0.344583disease_sites_above3_pc_DT <-0.488455mets_M0_M1a_pc_DT <-0.1669627mets_M1b_pc_DT <-0.186500888immunotherapy_DT <-0.207815# OutcomesORR_investigator_DT <-0.680284# Objective Response Rate (investigator assessed)# Read in comparator pseudo-IPD from CSV file provideddata_path <- here::here("files/dab tram/")file_name <-"Pseudo IPD DabrTram.csv"file_path <-paste0(data_path, file_name)#file.exists(file_path)Pseudo.DT <-read.csv(file_path) %>%mutate(wt =1, trt ="DAB+TRAM") # Create aggregate dataset for comparator (using frequencies, as above)Agg.DT <-data.frame(n = N_DT,age_above_55 = age_above_55_DT,gender_female = female_pc_DT,ecog_0 = ecog_0_pc_DT,ldh_great_uln = ldh_great_uln_pc_DT,disease_sites_above3 = disease_sites_above3_pc_DT,mets_M0_M1a = mets_M0_M1a_pc_DT,immunotherapy = immunotherapy_DT,ORR_investigator = ORR_investigator_DT )
5.4 Matching-Adjusted Indirect Comparisons
Overview of MAIC and Weighting Strategy
The approach presented here follows the appendix from Phillippo et al. (2016), and the text has been adapted to our example. MAIC is a technique that adjusts for differences in baseline characteristics between two treatment groups when direct head-to-head trials are unavailable.
5.4.1 Estimating Weights for Each Patient
To ensure that our NIVO+RELA cohort better reflects the characteristics of the DAB+TRAM cohort, we assign each patient a weight that reflects how well their baseline characteristics align with the comparator group. This weight is determined using a a logistic propensity score model, a statistical model similar to a logistic regression.
The model assigns weights based on key patient characteristics (such as age, sex, ECOG status, and LDH levels) so that, after weighting, the average baseline characteristics of the NIVO+RELA group match those of the DAB+TRAM group. The weighting model is defined as follows: \[
\log(w_{i}) = \alpha_0 + \alpha_1^T \mathbf{X}_{i}
\] where:
\(w_{i}\) is the weight assigned to each patient \(i\) in the NIVO+RELA group
\(X_{i}\) represents the patient’s baseline characteristics
\(α\) is a set of parameters that are estimated to achieve balance between groups
5.4.2 Conceptual Explanation of the Weighting Approach
\(X_{i}\) is a vector of patient characteristics, such as age and sex.
\(\alpha_1^T \mathbf{X}_{i}\) means each characteristic is multiplied by a weight (\(α\)) to determine its importance in the adjustment process.
Since the model is expressed in terms of the log of the weights, the actual weight values are obtained later by exponentiating the final result (see below). This ensures all weights remain positive.
Since our goal is to match the distributions of these characteristics across cohorts, we solve for the best values of \(α\) using an optimization technique (BFGS method). This ensures that after weighting, the adjusted NIVO+RELA group is statistically similar to DAB+TRAM in terms of baseline characteristics.
The weights are estimated using the method of moments to match the distributions of the baseline characteristics between cohorts. This problem is equivalent to minimizing the following objective function: \[
\sum_{i} \exp(\alpha_1^T \mathbf{X}_{i})
\] under the contsraint that: \[\bar{\mathbf{X}}_{(DAB+TRAM)} = 0\]
This means that, after weighting, the average values of the baseline characteristics in the NIVO+RELA group must match those in the DAB+TRAM group.
To achieve this, we center the covariate values by subtracting the mean baseline characteristics of the DAB+TRAM group from the patient data in both trials. This ensures that the two groups are aligned.
The adjustment parameters \(α_{1}\) are then estimated using a numerical optimization method called BFGS (named after its developers: Broyden, Fletcher, Goldfarb, and Shanno). We do not need to estimate \(α_{0}\) , as this term cancels out in the calculations.
Once these adjustments are applied, the estimated weights for each patient are calculated as: \[
\hat{w}_{i} = \exp\left( \bar{\mathbf{X}}_{i} \hat{\alpha}_1 \right).
\] Where:
\(\hat{w}_{i}\) (“w hat sub i”) represents the final weight assigned to patient. The hat notation (\(^\)) indicates that this is an estimated value, meaning it has been derived from the optimization process.
\(\bar{\mathbf{X}}_{i}\) represents the centered version of patient \(i\)’s baseline characteristics. The bar notation (\(ˉ\)) means that we have subtracted the mean values of the DAB+TRAM group from the patient’s original characteristics.
\(\hat{\alpha}_1\) is a vector of estimated adjustment parameters. Each element of \(\hat{\alpha}_1\) tells us how strongly a given characteristic (e.g., age, sex, ECOG status) influences the weighting.
As stated above, we use the exp() function since the model was built using the log of the weights, we take the exponential function to recover the actual weight values.
5.4.3 Interpreting the Weights
Once we estimate the weights, we can rescale them to better understand how individual patients contribute to the adjusted comparison. The
A rescaled weight >1 means a patient is overrepresented in the weighted analysis.
A rescaled weight <1 means a patient is underrepresented compared to the original cohort.
A weight close to 1 means the patient’s influence remains largely unchanged.
This method ensures that after weighting, the distribution of key baseline characteristics in NIVO+RELA matches that of DAB+TRAM, allowing for a more fair and unbiased comparison.
Below we create the weights function:
Show R Code
#Function to generate weights (one arm at a time) weights <-function(ipd, sld, continuous_vars, binary_vars) {## Explanation of the arguments:### ipd → The individual patient data (NIVO+RELA group).### sld → The summary-level data (baseline characteristics of DAB+TRAM).### continuous_vars → A list of continuous variables (e.g., age, lab values).### binary_vars → A list of binary variables (e.g., sex, ECOG status).# Define objective function for use in optimization process## Notes: This function computes the sum of exponentiated weighted patient characteristics to finds the best ## α1 values that ensure the weighted NIVO+RELA population matches the DAB+TRAM cohort objfn <-function(a1, X) {sum(exp(X %*% a1)) }# Define gradient function for use in optimization process gradfn <-function(a1, X) {colSums(sweep(X, 1, exp(X %*% a1), "*")) }# Specify an empty list to store the variables variables <-list()# Extract the binary variables if not NULLif (!is.null(binary_vars)) {for (var in binary_vars) { binary <- ipd[[var]] variables <-c(variables, list(binary)) } }# Combine all variables into a single matrix X.EM.0<-do.call(cbind, variables)# Prepare the means for centering means <-c()if (!is.null(binary_vars)) { means <-c(means, unlist(sld[paste0(binary_vars)])) }# Center the variables X.EM.0<-sweep(X.EM.0, 2, means, '-')# Optimization step to estimate alpha## Notes: Initial value for "par" was selected as a vector of zeros## X is passed to both objfn and gradfn as an additional argument opt1 <-optim(par =rep(0, ncol(X.EM.0)), fn = objfn, gr = gradfn, X = X.EM.0, method ="BFGS", control =list(maxit =50000))validate(need(opt1$convergence==0, message ="Algorithm fails to converge ")) a1 <- opt1$par wt <-exp(X.EM.0%*% a1) # Calculate weights using estimated alpha parameters wt.rs <- (wt /sum(wt)) *nrow(ipd) # Rescale weights# Combine pid, wt.rs, and wt into a data frame # Note the order of wt and wt.rs is the same as ipd since no sorting is done result <-data.frame(ID = ipd$ID, wt.rs = wt.rs, wt =wt) return(result) }
5.4.4 Variable matching
After defining the weighting function, the next step is to specify which baseline characteristics will be used to match the two cohorts (NIVO+RELA vs. DAB+TRAM).
Show R Code
#Set variables for matchingcontinuous_vars<-c() # No continuous variables used in this example, so we have an empty listbinary_vars <-c("age_above_55","gender_female","ecog_0","ldh_great_uln","disease_sites_above3","mets_M0_M1a","immunotherapy")# age_above_55 → Whether a patient is older than 55 (0 = No, 1 = Yes).# gender_female → Whether the patient is female (0 = Male, 1 = Female).# ecog_0 → Whether the patient has an ECOG performance status of 0 (0 = ECOG 1+, 1 = ECOG 0).# ldh_great_uln → Whether the patient has LDH above the upper limit of normal (ULN) (0 = No, 1 = Yes).# disease_sites_above3 → Whether the patient has ≥3 sites of disease (0 = <3 sites, 1 = 3+ sites).# mets_M0_M1a → Whether the patient has M0 or M1a metastatic status (0 = No, 1 = Yes).# immunotherapy → Whether the patient received prior immunotherapy (0 = No, 1 = Yes).
5.4.5 Generate Weights for Nivo + Rela
Show R Code
# Call weight function to generate (unanchored) weightsweights_unanchored_ <-weights(ipd = NR.IPD, sld = Agg.DT, continuous_vars, binary_vars)# Generate the GT table for patient-level unanchored weightsweights_unanchored_table <- weights_unanchored_ %>%head() %>%gt() %>%tab_header(title =html("<b>Patient-Level Unanchored Weights from MAIC for NIVO+RELA</b>") ) %>%cols_align(align ="center", columns =everything())# Render inside a scrollable HTML divhtmltools::div(style ="overflow-x: auto; height: 400px;", weights_unanchored_table)
Patient-Level Unanchored Weights from MAIC for NIVO+RELA
ID
wt.rs
wt
1
1.1786523
1.0388435
2
0.7099735
0.6257582
3
0.5214058
0.4595580
4
0.3283765
0.2894253
5
0.9709276
0.8557586
6
0.4929441
0.4344723
5.4.6 Merge Weights with Baseline and Outcome Data
Show R Code
# Merge the generated weights with the baseline and outcome dataweights_unanchored <-full_join(weights_unanchored_, NR.IPD, by ="ID")# Create the GT table to preview merged dataweights_unanchored_table <- weights_unanchored %>%head() %>%gt() %>%tab_header(title =html("<b>Patient-Level Unanchored Weights Merged with Baseline and Outcome Data</b>") ) %>%cols_align(align ="center", columns =everything())# Render inside a scrollable HTML divhtmltools::div(style ="overflow-x: auto; height: 400px;", weights_unanchored_table)
Patient-Level Unanchored Weights Merged with Baseline and Outcome Data
ID
wt.rs
wt
n
age_above_55
gender_female
ecog_0
ldh_great_uln
disease_sites_above3
mets_M0_M1a
immunotherapy
ORR_investigator
trt
OS_Time_Months
OS_Event
1
1.1786523
1.0388435
136
1
1
1
0
0
0
1
0
NIVO+RELA
63.5000000
0
2
0.7099735
0.6257582
136
1
0
1
0
1
1
0
1
NIVO+RELA
33.8213046
1
3
0.5214058
0.4595580
136
0
1
0
0
1
1
0
0
NIVO+RELA
63.5000000
0
4
0.3283765
0.2894253
136
1
1
1
0
0
1
0
1
NIVO+RELA
63.5000000
0
5
0.9709276
0.8557586
136
1
0
1
0
0
0
0
1
NIVO+RELA
0.8996455
1
6
0.4929441
0.4344723
136
0
0
1
1
0
1
0
1
NIVO+RELA
63.5000000
0
5.4.7 Summary of the Weight Distribution
Now that we have generated patient-level weights and merged them with the baseline data, the next step is to evaluate the distribution of these weights. This helps ensure that the weighting process is not excessively distorting the dataset and that no individual patient is overrepresented or underrepresented to an extreme degree. The below snippet of code provides summary statistics and a histogram of the distribution of the rescaled weights. Here, the rescaled weights range from 0.3 to 2.5. The histogram also shows that there are no very large weights.
Show R Code
# Summary of weights, histogramsummary(weights_unanchored$wt.rs)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.3031 0.5903 0.9231 1.0000 1.2634 2.5483
5.4.8 Assess Weight Distribution and Effective Sample Size (ESS)
5.4.8.1 Calculating the Effective Sample Size (ESS)
After applying weights to adjust for differences between the NIVO+RELA and DAB+TRAM cohorts, we need to determine how many patients still contribute meaningfully to the final analysis. The Effective Sample Size (ESS) provides this insight.
The ESS is calculated using the following formula:
This formula accounts for how weights are distributed across patients. A lower ESS than the original sample size (136 patients) suggests that some patients received low weights, reducing their impact on the analysis.
5.4.8.2 Why ESS Matters
If the ESS is too low, the weighted analysis is effectively based on fewer patients, which may reduce statistical power and make the results less reliable.
If the ESS remains close to the original sample size (136), most patients still contribute meaningfully to the final results, ensuring robust comparisons between the two treatment groups.
While 107.9 represents a moderate reduction from the original 136 patients but remains reasonable. This means that, even after reweighting, a sufficient number of patients are still effectively contributing to the comparison.
After weighting, all matched variables are now balanced between the cohorts. For example, the proportion of females in the NIVO+RELA group now exactly matches that of the DAB+TRAM cohort, confirming that the weighting process successfully adjusted for baseline differences.
Now that we have adjusted the NIVO+RELA cohort using patient-level weights, we can compare treatment outcomes between NIVO+RELA and DAB+TRAM. Specifically, we analyze objective response rate (ORR) and the hazard ratio (HR) for overall survival (OS) before and after weighting to assess how weighting impacts treatment comparison.
5.4.9.1 Binary outcome (ORR) comparison
In this section, we estimate the ORR for NIVO+RELA, both before and after weighting, to ensure a fair comparison with DAB+TRAM.
Before weighting: The raw ORR reflects the observed response rate in the original NIVO+RELA cohort.
After weighting: The ORR is adjusted using weights that make the NIVO+RELA cohort resemble the DAB+TRAM cohort in terms of key baseline characteristics.
We use logistic regression models to estimate ORR before and after weighting and then compare it with the DAB+TRAM cohort using odds ratios (ORs).
5.4.9.1.1 Step 1: Estimating ORR for the Comparator (DAB+TRAM)
First, we extract the reported ORR for the DAB+TRAM group from the summary data and compute its standard error (SE) using a standard formula for proportions:
Show R Code
# Get DAB+TRAM binary outcome (ORR) rate and SEORR_DT <- Agg.DT$ORR_investigator # Extracts reported ORR for DAB+TRAMORR_se_DT <-sqrt(Agg.DT$ORR_investigator * (1- Agg.DT$ORR_investigator)/Agg.DT$n)
The standard error (SE) formula used is: \[
SE = \sqrt{\frac{p(1−p)}{n}}
\] where:
\(p\) = the ORR.
\(n\) = sample size.
5.4.9.1.2 Step 2: Estimating ORR for NIVO+RELA (Before Weighting)
Before applying MAIC weights, we estimate the unadjusted ORR for NIVO+RELA using a logistic regression model.
Show R Code
# Unweighted logistic regression modelmodel_pre =glm(ORR_investigator ~1, data = weights_unanchored, family =binomial())
Why logistic regression?
Since ORR is a binary outcome (responded vs. not responded), logistic regression is used to model probabilities for binary outcomes.
The ~1 notation means we are estimating a single overall probability for response without considering additional variables (i.e., it’s an intercept-only model).
This model outputs log-odds, which we then convert into probabilities using the inverse logit transformation.
5.4.9.1.2.1 Extracting the Variance and Transforming the Estimate
Show R Code
# Sandwich estimator of varianceV.sw_pre <-vcov(model_pre) # Extracts variance-covariance matrixtrt_label_pre ="(Intercept)"# Intercept corresponds to the log-odds of ORR# Extract log-odds and convert to probability (ORR)mean_trt_pre = model_pre$coefficients[trt_label_pre]v_trt_pre = V.sw_pre[trt_label_pre, trt_label_pre]# Convert log-odds to probability (inverse logit transformation)mean_trt_pre =exp(mean_trt_pre) / (1+exp(mean_trt_pre))# Compute standard error using the delta methodse_trt_pre =sqrt((v_trt_pre / ((1/ (mean_trt_pre * (1- mean_trt_pre)))^2)))
5.4.9.1.3 Why Use the Sandwich Estimator?
The sandwich estimator corrects for bias in standard error estimation caused by weighting.
Without it, standard errors could be underestimated, making confidence intervals too narrow and giving a false sense of precision.
Why does this happen? When we apply MAIC weights, some patients contribute more than others, breaking the assumption that all observations contribute equally to the analysis.
Using the sandwich estimator ensures that our uncertainty (SE) is correctly adjusted for this imbalance.
5.4.9.2 Step 3: Estimating ORR for NIVO+RELA (After Weighting)
We now apply weights to adjust for baseline differences before estimating ORR.
Show R Code
# Weighted logistic regression modelmodel_post =svyglm( ORR_investigator ~1, # Model estimates ORR as a single proportiondesign =svydesign( id =~1, # No clustering (each row is an independent patient)weights =~weights_unanchored[["wt"]], # Uses MAIC weightsdata = weights_unanchored), family =binomial(), # Logistic regression for binary outcomena.action = na.omit # Removes missing values)
5.4.9.2.0.1 Extracting the Weighted ORR Estimate
Show R Code
# Sandwich estimator of varianceV.sw_post <-vcov(model_post)trt_label_post ="(Intercept)"# Extract the estimated coefficient (log-odds of treatment effect) and its variancemean_trt_post = model_post$coefficients[trt_label_post]v_trt_post = V.sw_post[trt_label_post, trt_label_post]# Convert log-odds to probability (inverse logit transformation)mean_trt_post =exp(mean_trt_post) / (1+exp(mean_trt_post))# Compute standard error of the transformed estimate (using delta method as above)se_trt_post =sqrt((v_trt_post / ((1/ (mean_trt_post * (1- mean_trt_post)))^2)))
5.4.9.3 Calculating the Odds Ratio (OR) Between Treatments
Now that we have weighted and unweighted ORRs, we compare NIVO+RELA vs. DAB+TRAM using odds ratios (ORs).The following code creates a function, get_comparison_arms, that produces the odds ratio of NIVO+RELA vs. DAB+TRAM using the mean event rates and SEs calculated above. Because only the NIVO+RELA cohort is weighted in this MAIC, the rates for the DAB+TRAM cohort remain the same in both the before and after matching comparisons. The function returns a data frame with the OR, SE, confidence intervals (CI), and p-value
Show R Code
# Function to compute OR and confidence intervalsget_comparison_arms <-function (mean1, mean2, se1, se2) { OR <- (mean1 / (1- mean1)) / (mean2 / (1- mean2)) # Compute OR logOR <-log(OR) # Convert to log scale logOR_se <-sqrt(se1^2/ mean1^2+ se1^2/ ((1- mean1)^2) + se2^2/ mean2^2+ se2^2/ ((1- mean2)^2)) # Compute SE of log OR logOR_2.5<- logOR -qnorm(0.975) * logOR_se # 95% CI lower bound logOR_97.5<- logOR +qnorm(0.975) * logOR_se # 95% CI upper bound out_2.5<-exp(logOR_2.5) # Convert back to OR scale out_97.5<-exp(logOR_97.5) pvalue <-pnorm(abs(logOR / logOR_se), lower.tail = F) *2# Compute p-value ## Note: we did not use p-values in the process, ## this is for example only output <-data.frame(OR, logOR_se, out_2.5, out_97.5, pvalue)colnames(output) <-c("OR", "SE", "LL", "UL", "P-value")return(output)}
Running the Comparison Function
Show R Code
# OR before and after weightingcompIpdPre =get_comparison_arms(mean1 = mean_trt_pre, # ORR for NIVO+RELA before weightingmean2 = ORR_DT, # ORR for DAB+TRAM (reported value)se1 = se_trt_pre, # Standard error of ORR for NIVO+RELA before weightingse2 = ORR_se_DT # Standard error of ORR for DAB+TRAM)compIpdPost =get_comparison_arms(mean1 = mean_trt_post, # ORR for NIVO+RELA after weightingmean2 = ORR_DT, # ORR for DAB+TRAM (reported value)se1 = se_trt_post, # Standard error of ORR for NIVO+RELA after weightingse2 = ORR_se_DT # Standard error of ORR for DAB+TRAM)# Print OR resultsprint(compIpdPre) # Before weighting
OR SE LL UL P-value
(Intercept) 0.3710323 0.1404724 0.2817354 0.4886322 1.688159e-12
Show R Code
print(compIpdPost) # After weighting
OR SE LL UL P-value
(Intercept) 0.3826066 0.1540082 0.2829179 0.5174215 4.423721e-10
5.4.9.3.1 Interpretation of Odds Ratio
The odds ratios (ORs) are used to compare how likely patients in each cohort are to respond to treatment.
OR > 1 → NIVO+RELA patients are more likely to respond than DAB+TRAM.
OR < 1 → NIVO+RELA patients are less likely to respond than DAB+TRAM.
OR = 1 → No difference in response likelihood.
The before and after weighting comparison tells us how much adjusting for baseline imbalances affects the results. If the OR shifts after weighting, it suggests that the observed differences before weighting were partially due to differences in patient characteristics, rather than a true treatment effect.
5.4.9.4 Time-to-event outcome (OS) comparison
For time-to-event outcomes, we need patient-level survival data for both cohorts:
NIVO+RELA (IPD) → Simulated patient-level data.
DAB+TRAM (Comparator) → Reconstructed from published Kaplan-Meier curves
The Cox proportional hazards (Cox PH) model is used to estimate the hazard ratio (HR), which compares the risk of an event (e.g., death or disease progression) over time between the two treatment groups. The analysis is performed before and after weighting, allowing us to assess how adjusting for baseline differences affects the HR.
5.4.9.4.1 Step 1: Preparing the Survival Data
We first merge the simulated NIVO+RELA dataset and reconstructed DAB+TRAM dataset into a single dataframe for analysis.
wt → Weighting factor (only relevant after MAIC adjustment)
5.4.9.4.2 Step 2: Cox PH Model Before Weighting
Before applying MAIC weights, we estimate the unadjusted hazard ratio (HR) to compare survival outcomes.
Show R Code
# Cox PH model comparing NIVO+RELA vs. DAB+TRAM (before weighting)NivoRela_DabrTram_unadj <-coxph(Surv(OS_Time_Months, OS_Event) ~ trt, data = tte_os)# Extract key resultsNivoRela_DabrTram_unadj_logHR <-coef(NivoRela_DabrTram_unadj) # Log hazard ratioNivoRela_DabrTram_unadj_logSE <-sqrt(NivoRela_DabrTram_unadj$var[1,1]) # Log HR standard errorNivoRela_DabrTram_unadj_HR <-exp(coef(NivoRela_DabrTram_unadj)) # Convert logHR to HRNivoRela_DabrTram_unadj_p <-summary(NivoRela_DabrTram_unadj)$coefficients[5] # p-value# Print resultsprint(NivoRela_DabrTram_unadj_HR) # HR before weighting
trtNIVO+RELA
0.4849858
Show R Code
print(NivoRela_DabrTram_unadj_p) # P-value before weighting
[1] 3.661652e-07
Key Takeaways:
The Cox model estimates the HR → HR < 1 means NIVO+RELA reduces risk, while HR > 1 means DAB+TRAM has lower risk.
This initial estimate is unadjusted → It does not account for differences in baseline characteristics.
5.4.9.4.3 Step 3: Cox PH Model After Weighting
Now we apply MAIC weights to ensure that the weighted NIVO+RELA cohort matches the DAB+TRAM cohort in baseline characteristics. This allows for a fairer comparison.
Show R Code
# Cox PH model after weighting (accounting for baseline differences)NivoRela_DabrTram_adj <-coxph(Surv(OS_Time_Months, OS_Event) ~ trt, data = tte_os, weights = wt) # Extract key resultsNivoRela_DabrTram_adj_logHR <-coef(NivoRela_DabrTram_adj) # Log hazard ratioNivoRela_DabrTram_adj_logSE <-sqrt(NivoRela_DabrTram_adj$var[1,1]) # Log HR standard errorNivoRela_DabrTram_adj_HR <-exp(coef(NivoRela_DabrTram_adj)) # Convert logHR to HRNivoRela_DabrTram_adj_p <-summary(NivoRela_DabrTram_adj)$coefficients[6] # p-value# Print resultsprint(NivoRela_DabrTram_adj_HR) # HR after weighting
trtNIVO+RELA
0.4968978
Show R Code
print(NivoRela_DabrTram_adj_p) # P-value after weighting
[1] 4.384994e-06
5.4.10 Key Takeaways
Tip
Why Is This Important?
After weighting, the NIVO+RELA cohort should be better balanced with the DAB+TRAM cohort in terms of their patient characteristics.
A shift in HR after weighting indicates that some of the initial differences in survival were due to imbalances in baseline characteristics, rather than treatment effect alone.
6 Conclusion
Final Thoughts
This supplemental document provides a detailed, reproducible framework for performing MAICs using simulated and reconstructed patient-level data. We demonstrated core steps of the methodology, including:
✅ Extraction and digitization of published Kaplan-Meier curves
✅ Reconstruction of individual patient-level survival data
✅ Simulation of patient-level data when original datasets are unavailable
✅ Application of MAIC weighting to adjust for baseline imbalances
✅ Comparison of binary and time-to-event outcomes across treatment cohorts
Through this worked example, we illustrate how these techniques can support comparative effectiveness research when direct head-to-head trial data are not available. Importantly, this example incorporates simulated patient-level data for the NIVO+RELA cohort, used solely for illustrative purposes due to the unavailability of individual patient data from RELATIVITY-047.
While the data presented here do not represent real patients or actual trial results, the analytical approach, code, and workflow provide a foundation that can be adapted to other MAIC applications. This framework is generalizable to other treatment comparisons, outcomes, and therapeutic areas, offering a flexible toolset for researchers navigating the challenges of indirect comparisons.
For validated results and interpretations, please refer to our main publication. Researchers interested in applying these methods to real-world data are strongly encouraged to consult the primary literature and statistical experts to ensure rigor and accuracy.
7 References
Sources
Dummer R, et al. Encorafenib plus binimetinib versus vemurafenib or encorafenib in patients with BRAF-mutant melanoma (COLUMBUS): a multicentre, open-label, randomised phase 3 trial. Lancet Oncology. 2018;19(5):603-615. doi:10.1016/S1470-2045(18)30142-6
Guyot P, Ades A, Ouwens M, Welton N. Enhanced secondary analysis of survival data: reconstructing the data from published Kaplan–Meier survival curves. BMC Med Res Methodol. 2012;12:9. doi:https://doi.org/10.1186/1471-2288-12-9
Phillippo D, Ades T, Dias S, Palmer S, Abrams K, Welton N. NICE DSU technical support document 18: methods for population-adjusted indirect comparisons in submissions to NICE. NICE Decision Support Unit. 2016.
Signorovitch J, Ayyagari R, Cheng D, Wu E. Matching-adjusted indirect comparisons: a simulation study of statistical performance. Value Health. 2013;16(3):A48.
8 Disclosures
Use of Generative AI
This supplemental methods and worked example were developed with the assistance of OpenAI to generate code and text, helping to structure explanations, annotate key concepts, and improve readability. While all statistical methodologies, data interpretations, and results remain the responsibility of the authors, AI-assisted tools were used to enhance clarity and streamline the documentation of this MAIC approach. Users should refer to the original publication for validated results and consult with statistical experts before applying these methods to real-world data.